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Abstract. 

This paper addresses the automatic recognition of handwritten tem- 
perature values in weather records. The locahzation of table cells is based 
on line detection using projection profiles. Further, a stroke-preserving 
line removal method which is based on gradient images is proposed. The 
presented digit recognition utilizes features which are extracted using a 
set of filters and a Support Vector Machine classifier. It was evaluated 
on the MNIST and the USPS dataset and our own database with about 
17,000 RGB digit images. An accuracy of 99.36% per digit is achieved for 
the entire system using a set of 84 weather records. 
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1 Introduction 

In addition to handwritten documents stored in historic archives there are pro- 
cesses, like manually filling out forms, which still depend on pen and paper. 
However, accessing the information stored in those documents requires time 
and manpower [U]. By digitizing those documents, the advantages of digitally 
stored information, such as ease of access, can be exploited for handwritten 
documents [9]. 

The documents covered in this paper, i.e. weather records, consist of known 
printed forms with handwritten information. Even though the structure of the 
form is known beforehand, due to prior handling and the scanning process, 
global and local deformations are introduced. Further, the structure of forms 
suggests certain areas which are applicable for handwriting, yet writers are not 
bound to this constraint. Additionally, handwriting varies significantly due 
to different writing styles. Even the handwriting of a single writer exhibits 
variations. In Figure [l] an example of a weather record is shown. The origins 
of the weather records regarded in this paper are several measurement stations 
located in Austria (Lower Austria). 
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Figure 1: Weather record from February of the year 2000 from a measurement 
station in Lower Austria. The temperature values are located in the column 
"Lufttemperatur °C". 
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The localization of the numerical data is conducted by first reconstructing 
the tabular structure of the form. Using vertical and horizontal projection 
profiles, the rough positions of the lines building up the table are found. Errors 
in the reconstruction are detected using so-called Form Properties which store 
a-propri information of the form. Finally, the extraction of the digits and signs 
is done using a binarization based on the Savakis filter [10] and subsequent 
Connected Component (cc) analysis. 

The handwritten digit recognition is based on the work of Labusch et al. [6] . 
Using Principle Component Analysis (PCA) the features of the normalized 
characters are extracted. As a classifier a multiclass Radial Basis Function (rbf) 
Support Vector Machine (SVM) is used. 

The paper is structured as follows: in the following section digit recognition 
and document analysis for digit recognition is presented. In Section [3] the lo- 
calization of the numerical data is explained in detail. Subsequently, the digit 
recognition approach is shown in Section [4j Finally, in Section |5] an evaluation 
of the presented method is conducted and final remarks are given in Section [6J 

2 Related Work 

Earlier systems for recognizing handwritten digits in documents include for in- 
stance US addresses [11 and census forms [4 . More recently, Bulacu et al. [1 
presented a system for recognizing handwritten digits in historic documents 
from the archive of the Cabinet of the Dutch Queen. Further, Richarz et al. [9] 
proposed a semi-supervised method for the transcription of historic weather 
documents. 

For digit recognition Liu et al. [8] presented an approach using local stroke 
directions of the handwritten digits as features. These features are extracted 
from gradient directions. Teow et al. [14 proposed a feature extraction approach 
which is inspired by the biologic vision system. They use 16 filters designed to 
detect edges and end-stops of various orientations. These filters simulate the 
behaviour of receptive fields in the visual system. Another vision based approach 
is proposed by Lambusch et al. [6 . They proposed a feature extraction method 
based on learned sparse representations. Keysers et al. [5 proposed an image 
matching approach for digit recognition. The main idea is to map the pixels of 
a test image onto the pixels of a reference image. The quality of the mapping 
is determined by a distance function which is used in combination with a k-NN 
classifier to predict the class label of the test image. 

As shown by Lecun et al. [7 it is possible to achieve state-of-the-art recog- 
nition accuracies without using a hand-crafted feature extractor but instead 
incorporate the feature extraction process in the (trainable) classifier. They pro- 
posed a specialized Neural Network (NN), the so-called Convolutional Neural 
Network (CNN), with alternating convolution and subsampling layers. Ciresan 
et al. further improved this architecture by proposing so-called Deep NN and 
by combination of multiple Deep NN [2 . 

3 Document Analysis for Digit Recognition 

In order to extract the fields containing the numerical data, i.e. the signs and 
digits, they have to be located beforehand. The documents in scope of this 
paper, i.e. the weather records, are forms which have been filled out manually. 
Therefore, an underlying tabular structure is present within the documents. In 
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Figure 2: Image showing vertical projection profile with the detected peaks and 
the part of image in background. 

this section, first an approach using line detection for reconstructing the form 
is presented. Second, a method for stroke preserving removal of form lines is 
shown. Third, the segmentation of the digits and signs is depicted. 

3.1 Table Reconstruction 

The first step for reconstructing the table structure is finding vertical and hori- 
zontal lines in the image. Starting from a rotation corrected grayscale image [3] 
vertical and horizontal projection profiles are used to find the vertical and hor- 
izontal lines, respectively. 

As shown in Figure [2] the peaks in the vertical projection correspond to the 
vertical lines. By introducing a minimum horizontal distance the peaks are 
searched in local areas. The blue circles in Figure [2] depict the peaks detected 
which are used for further processing. 

To cope with errors in the line detection process, a-priori information about 
the form, so-called Form Properties, is used to correct the detected lines. For 
instance, horizontal lines are added if there are gaps contradicting the Form 
Properties. 

3.2 Line Removal 

To segment the single digits and signs it is necessary to remove the form borders. 
However, due to cursively written digits and signs, part of the symbols may 
cross the borders. To preserve the characteristics of the digits, a line removal 
preserving handwritten strokes is applied. 

This is done using a Wiener filter on vertical and horizontal gradient images. 
The Wiener filter is a reconstruction filter. Starting from a filtered - or degraded 
- image G{u,v), which is filtered with a known filter function H{u,v) (in this 
case a vertical or horizontal Sobel filter), the task is to reproduce the original 
unfiltered image F{u,v). 

To cope with noise, the Wiener filter extends the inverse filtering approach 
by modelling the noise and the image as two random variables. The Wiener 
filter minimizes the mean square error between the reconstructed image F{u, v) 
and the original image F{u, v). This minimum is given in the frequency domain 
as 



F{u,v) 



1 



\H{u,v) 



H{u,v) \H{u,v)f^K 



G{u,v) 



(1) 



where \H{u,v)\ is defined as the product of the filter function with its 
complex conjugate. The parameter K represents the signal-to- noise ratio, i.e. 
the power spectrum of the noise \N{u,v)\ divided by the power spectrum of 
the original image \F(u.^v)\ . The term in the brackets in Equation 111 is called 
the Wiener filter. 
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The parameter K can be used to liniit the reconstruction effect of the Wiener 
filter. Additionally, by computing the gradient the constant term of the image 
is lost which is advantageous in case of border removal. In Figure |3] the whole 
process is visualized. In order to remove the vertical lines in the original image 
(a), first the vertical gradient is computed. As shown in (b) this removes the 
vertical lines. To reconstruct the digits, the Wiener filter is used. In (c) the inner 
area of the digit '1' is reconstructed while the vertical lines remain removed. 
After that, median filtering (see (d)) and intensity scaling is used to remove 
artifacts and enhance the result (see (e)). 



1/^ 




(a) (b) (c) (d) (e) 

Figure 3: The five states of border removal using the Wiener filter (a-e). The 
original image (a), gradient image (in this case vertical) (b), gradient after 
deconvolution (c), median filtered deconvolution (d) and resulting image after 
scaling the intensity values (e). 

However, only the regions around the vertical and horizontal lines have to 
be altered. This means, in the final image, these are the only regions which 
are replaced while the rest of the image stays the same. This way, regions not 
affected by horizontal or vertical lines retain the initial image quality. 

The rough line regions are defined by the reconstructed form and, addition- 
ally, an offset is used to extend the search area to cope with local variations. 
The exact location of the lines is subsequently found by searching for peaks in 
the gradient images. This is carried out locally for each column and row in the 
Region of Interest (roi), respectively. 



3.3 Segmentation 

In the next step the background is removed using a threshold image. The 
threshold image is computed using an adapted version of the Savakis filter [lOj. 
In distinct 15x15 blocks the image pixels are grouped into foreground pixels Vi 
and background pixels Wj using a precomputed threshold (Otsu). The local 
threshold t is then defined as the boundary separating the mean of the intensity 
values of the foreground pixels v and the mean of the intensity values of the 
background pixels w: 

- I — 1 ^ 1 '^ 

t = , with V = J 2_^ '^* ^^^ w = — 2_^ '^j (2) 



In Figure [4] this approach is compared to two state-of-the-art binarization 
approaches. Due to the high image quality of the weather records, this approach 
shows less missing character strokes and clearer frame boarders. The threshold 
image is then used to remove the background pixels. Additionally, a Gaussian 
filter is applied to enhance the result. 

The digits and signs are then extracted by computing Connected Compo- 
nents (ccs) and assigning them to the nearest cell centers. To remove noise, 
CCs where the number of pixels is under a certain threshold are removed. The 
last step before the character recognition is a size normalization. 
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Figure 4: Comparison of different binarization techniques on an example (a). 
The method proposed (b) and approaches by Su et al. from the years 2010 pj] 
(c) and 2011 ^ (d). 

4 Handwritten Digit Recognition 

The digit recognition algorithm is based on the work of Labusch et al. [6 . In 
this section, first the calculation of the basis functions is explained. Then, the 
feature extraction and classification are outlined. 

4.1 Basis Functions 

The basis functions are learned using 100,000 image patches. As proposed in [6] 
the size of the image patches is set to 13x13. A patch P(x, y) is extracted by 
placing it at a random position in a training image. The training images used for 
this extraction are evenly distributed among the whole training set. As stated 
by Labusch et al. [6 preprocessing is necessary for the PCA. This is done by 
converting the extracted patches into so-called centered vectors [6 . First, the 
mean pixel value of the patch is subtracted: 



S{x,y) = P{x,y)-P 



(3) 



Next, S{x^y) the mean value over all S{x^y) is computed for each pixel. 
Finally, the centered vectors are obtained by subtracting S{x^y) from all S{x^y) 



Sp{x,y)=S{x,y)-S{x,y) (4) 

After extracting the patches, a PCA is used to learn the basis functions 
from these samples, i.e. the parameter of the underlying model. This is done 
by creating the covariance matrix and computing the eigenvectors and the cor- 
responding eigenvalues. 

4.2 Feature Extraction and Classification 

The coefficient images are computed by subsequently convoluting the image with 
the computed eigenvectors. As a result, for each of the 13x13 basis functions, a 
coefficient image Fi is produced. To achieve local shift invariance the extrema 
in 9 distinct blocks in each of the Fi are used as feature vector. 

As proposed by Labusch et al. [6], an rbf is chosen as the kernel for the 
SVM. The one-versus-one approach is used for handling multiple classes. This 
is important for digit recognition due to the fact that there are at least ten 
classes and SVMs without this extension only allow binary classification. 



5 Experiments 

In this section first the digit recognition module is evaluated using the MNIST, 
USPS and our own digit database. Subsequently, the whole process starting 
with scanned images to classified digits is evaluated on a set of weather records. 
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5.1 Digit Recognition 

For the evaluation of the digit recognition module three different datasets are 
used. First, the MNISTQ database which consists of 60,000 training images and 
10,000 test images. Second, the USPS dataset consists of 7,291 training samples 
and 2,007 test samples. Third, digit images of our own database were used as 
comparison. In contrast to the previous datasets, the images are available in 
RGB and no normalization procedure was applied beforehand. The digits were 
written by approximately 120 different writers. The 17,322 digit images in this 
database are randomly split into a trainingset containing 13,322 samples and a 
testset with 4,000 samples. 

The digit recognition module achieves an accuracy of 99.24% on the MNIST 
database. This coincides with the results reported by Lambusch et al. [6' on 
whom the method is based on. For the more challenging USPS database an ac- 
curacy of 97.16% is achieved. Using the normalization proposed by the MNIST 
dataset an accuracy of 96,45% is achieved on our own database. Using the prob- 
ability values of the SVM to compute a second guess, the accuracy is increased 
to 99,05%. This leads to the assumption that using these probabilities in com- 
bination with semantic information, such as mean and standard variation for 
the temperature values at a specific time and day, can lead to more accurate 
results. 

To investigate how the size difference of the training- and testset influences 
the results, an evaluation was conducted using swapped sets. On both the 
MNIST and our own database this leads to a decrease of the accuracy to 98.20% 
and 94.75%, respectively. 

5.2 Weather records 

The weather records used for evaluation were recorded in the years 1994 to 2000 
at a measuring station in the province Lower Austria. Each of the 84 documents 
contains the temperature values for one month with three measurements per 
day. In Figure [T] an example is shown. It has to be noted, that all records were 
filled out by the same writer and that for all documents a manually created 
ground-truth is available. However, it contains just the consolidated numbers 
though no information about the digits or signs itself. For instance, the plus 
sign is often omitted by writers (more than half of the time in these records) and 
leading zeros can be left out. To cope with this problem empty fields are treated 
either as an additional class or plus signs in case of digit and sign recognition, 
respectively. The accuracy of the digit recognition is computed using: 

number of correctly classified cells . . 

accuracy = — ^-, (5) 

number oi cells contammg CCs 

In Table [l] the accuracies are depicted for two different splits of the doc- 
uments. Independent of the size difference between the training and testset 
the accuracy is above 99%. Unfortunately, for the sign recognition there is no 
ground-truth stating the amount of plus signs present in the documents. How- 
ever, in the first case depicted in Table [l] only three instances of falsely classified 
signs are found for the whole year 1994. Assuming one sign per number, this 
corresponds to an error rate of 0.3%. 

The main source of error is the segmentation process. First, artifacts not re- 
moved before the normalization stage may lead to degraded normalized images. 



■•^ http://yann.lecun.com/exdb/mnist/ 
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Trained Years 


Tested Years 


Tested Digit Cells 


Digit Accuracy 


1995-2000 
1997-2000 


1994 
1994-1997 


208 
8,065 


99.10% 
99.36% 



Table 1: Digit cell recognition accuracies achieved using the weather records 
from the years denoted in the first two columns. 

This is for instance shown in the 5th and 7th example in Figure [5] Second, 
parts of the digit may be missing if either the line removal algorithm is not able 
to preserve the handwritten strokes or in case of under-segmentation. However, 
the 2nd and 3rd digit in Figure [5] show that a correct classification is neverthe- 
less possible in some cases. Other problems are underlinings of temperatures, 
corrections and meaningless strokes which are shown in the 1st, 4th and 6th 
example in Figure [5) respectively. 
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Figure 5: Examples of correctly (green) and falsely (red) classified normalized 
digit cells. The numbers at the bottom depict the output of the classification. 



6 Conclusions 

In this paper an approach for the automatic recognition of temperature values 
in weather records is introduced. The segmentation of the digits is achieved by 
reconstructing the tabular structure using line detection. Using a filterbank the 
features of the digits and signs are extracted and subsequently classified with 
an SVM. The evaluation conducted on 84 weather records showed an accuracy 
of over 99% per digit. The digit recognition model is able to cope with missing 
digit strokes and underlinings. However, due to the normalization process the 
impact of artifacts can be amplified and additionally the system is not able to 
cope with meaningless strokes or corrections. 

For future work, the main performance enhancements are expected by mak- 
ing the segmentation process more robust to artifacts. 
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